St Reading Cross-genre Feature Comparisons for Spoken Sentence Segmentation 5
نویسندگان
چکیده
Automatic sentence segmentation of spoken language is an important precursor to downstream natural language processing. Previous studies combine lexical and prosodic fea19 tures, but can impose significant computational challenges because of the large size of feature sets. Little is understood about which features most benefit performance, partic21 ularly for speech data from different speaking styles. We compare sentence segmentation for speech from broadcast news versus natural multi-party meetings, using identical 23 lexical and prosodic feature sets across genres. Results based on boosting and forward selection for this task show that (1) features sets can be reduced with little or no loss in 25 performance, and (2) the contribution of different feature types differs significantly by genre. We conclude that more efficient approaches to sentence segmentation and similar 27 tasks can be achieved, especially if genre differences are taken into account.
منابع مشابه
Syntactic Feature of EFL Speakers’ Conference Presentations: The Case of Passive Voice and Pseudo-Cleft
Acquiring proficiency in academic genres is a key factor in research community. Among various genres in academic discourse communities, spoken genre, especially Conference Presentations (CPs), play a crucial role in research communities, though investigation on this important genre is in its infancy or is relatively under-researched. Therefore, the present study aims to shed light on the import...
متن کاملAutomatic Segmentation for Emotional Feature Extraction from Spoken Sentence
Perception of speaker’s emotion is one of interesting issues in human-robot interaction. Especially, friendly and instinctive interface between robots and humans is required for making service robots useful to inexpert interacting with robots. Among several mode in communications, speech is easiest method for human because speech is fundamental communication tool in human-human interaction. How...
متن کاملAn Open Source Prosodic Feature Extraction Tool
There has been an increasing interest in utilizing a wide variety of knowledge sources in order to perform automatic tagging of speech events, such as sentence boundaries and dialogue acts. In addition to the word spoken, the prosodic content of the speech has been proved quite valuable in a variety of spoken language processing tasks such as sentence segmentation and tagging, disfluency detect...
متن کاملOrthographic variations and visual information processing.
Based upon an analysis of how graphemic symbols are mapped onto spoken languages, three distinctive writing systems with three different relations between script and speech relationships are identified. They are logography, syllabary, and alphabet, developed sequentially in the history of mankind. It is noted that this trend of development seems to coincide with the trend of cognitive developme...
متن کاملDependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries
Spoken monologues feature greater sentence length and structural complexity than do spoken dialogues. To achieve high parsing performance for spoken monologues, it could prove effective to simplify the structure by dividing a sentence into suitable language units. This paper proposes a method for dependency parsing of Japanese monologues based on sentence segmentation. In this method, the depen...
متن کامل